-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Optimize IP field parsing #132463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IP field parsing #132463
Conversation
This looks a nice improvement. Do you have an idea how this impacts indexing performance? For example by running |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Based on profiles I captured for 30s of a run of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice improvement.
Just to double check that I'm reading it correctly: for parsers that don't support optimizedText()
, this will still end up doing just a single String-to-bytes conversion in the parser itself, right?
server/src/main/java/org/elasticsearch/index/mapper/ESInetAddressPoint.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/ESInetAddressPoint.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the changes to InetAddresses.
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR would greatly benefit from some JMH benchmarks to show the differences and guide some development choices (e.g. specialized functions for String vs byte[])
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Outdated
Show resolved
Hide resolved
The approach I took here was to conduct metic ingestion benchmarks and analyzing cpu and allocation flame graphs to have a better understanding of the real-world impact outside of narrow microbenchmarks. |
Sorry, I think I misinterpreted your suggestion. It makes sense to compare the performance of the String-based methods before and after this change to see what the difference is. It may actually not be a regression because of the other optimizations. Let's see. |
I've added benchmarks and compared the before and after. To summarize, the throughput is higher in all scenarios after the changes proposed in this PR. The only regression is that there are a more allocations when handling IP4v addresses.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Looks very good indeed! |
Optimizes IP field parsing in the following ways:
XContentParser#optimizedTextOrNull
to avoid the UTF-8 to Java String conversion overhead.ipString.split(":")
in favor of a more efficient algorithm that iterates over all character bytes only once.InetAddress
. This requires creating anESInetAddressPoint
class that's similar to Lucene'sInetAddressPoint
as the latter can only be constructed via anInetAddress
.The semantics are kept in tact as-is and all IP parsing related test are still passing.
This could potentially hurt the performance of code paths that don't have access to an UTF-8 encoded byte array (but have a String) as the String will now need to be converted to a byte array first. However, these code paths don't seem to be as performance sensitive from a first glance.